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[57] ABSTRACT 
A method for determining the relationship of chemical 
structure to biological activity based on the topology 
and physiochemical properties of "cavities" or "artific- 
ial constructs" constructed from molectilar models of 
nucleic acids, including, double-stranded DNA, double- 
stranded RNA and double-stranded DNA-RNA com- 
plexes. With DNA models, the second codon base is 
removed from each of the sixty-four possible codon- 
anticodon complexes in the configuratipiif of DNA to 
form the cavities. Cavities were also formed between 
the base pairs of partially uncoiled DNA. Using the 
conventional physiochemical principles of hydrogen 
bonding and steric constraints, molecules having vary- 
ing types of biological activity will fit stereochemically 
into certain cavities while, conversely, molecules which 
do not form complementary fits into a given cavity will 
not possess the respective biological activity. Also, the 
method can be utilized to determine the degree of bio- 
logical activity of compounds. 

8 Claims, 11 Drawing Figures 
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The complementary hydrogen bond pairing of L-amIno acid R groups and 



adjacent nucleic acid bases permitted In DNA "cavities". 

Amino Acid Adjacent base* R Group-Base interaction 


Arginine 


XL 


-NH""N3(C);-NH""02(C) 


Aspartic Acid 


j_ 


COO® "HNam 


Asparagine 


X 


-C0-HN3(T):-NH— 04(T) 


Cysteine 


c 


-S-H N3(C) 


Glutamic Acid 


X 


COO® — HN3(T) 


Glutamine 


1 


-C0-'HN3Cn;-NH-04(T> 


Histidine 


j_ 


-N"-HN3m 


Lysine 


XC 


-NH3-.04(T):NH3-02(C) 


Serine 




-0H-N3IC); H-O-HNi(G) 


Threonine 




-0H-N3{C); hpO-HNi(G) 


Tryptophan 




N-H-N3(C) 


Tyrosine 


J_ 


-OH 04('n 



* underlined bases are the second bases in the anticodons for the amino acid listed. • 
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1 2 

bases (FIG. IB). The complementary of Watson-Criclc 
METHOD OF PREDICTING BIOLOGICAL nucleic acid base pairs and the putative complementary 

ACTIVITY OF:C0MFOU>}DS BY NUCLEIC ACID pairing of structurally analogous amino acids with bases 
MODELS arc Ulustrated in A-F of FIG. IB: (A) cytosine(C> 

3 guanine(G) base pair, (B) cytosinc-arginiiie(ARG) pair; 
CROSS-REFERENCE TO RELATED (O proUne(PRO)-guanine pair, (D) thymineCI^- 

APPLICATION adenine(A) pain (E) thymineihi8tidine(HIS) pair; and 

This is a continuation-in-part application of the co- (F) isoleucine(ILE)-adenine pair. With' fourteen of the 
pending prior application of Lawrrace B. Hendry, et twenty L-amino acids, only one complementary amino 
al., Ser. No. 335,589, fdcd Dec. 29, 1981 entitled acid-base pair is possible; in each case, the base is the 
"Method of Predicting Biological Activity of Com- second in its anticodon. 

pounds by DNA Models", now expressly abandoned as The above-described structural analogies between 
of the filing date granted this appUcation. L-amino acids and nucleic acid bases suggested that it 

, ,^„r„*™^^, might be possible to employ modelling techniques to 

BACKGROUND OF THE INVENTION is incorporatV^ino acids directly into DNA as if they 

1 , Field of the Invention were bases with apparent stereochemical speciflcity and 
This invention relates generally to the determination without disrupting the double helix configuration. 

of the biological ftincUon of molecules and, more specif- citmw ap v the iMVPsmnM 

ically , to a method of predicting the biological activity SUMMARY OF THE INVENTION 

of compounds using models of nucleic acids, including 20 -pjjg invention described herein is a new method for. 
DNA, RNA and DNA-RNA complexes. among other things, predicting the biological activity 

2. Description of the Prior Art (as drugs, toxins, growth regulators, etc.) of molecules, 
DNA (deoxyribonucleic acid) is a repeating poly- as well as the degree of biological activity of such com- 

meric structure which has two primary components: a pounds. The term "biological activity" means the stimu- 
deoxyriboscphosphate backbone and a series of nucleic ^5 inhibition of cellular physiology as in the inter- 

acid bases stacked in a helical pattern. The DNA molec- ference with the pathophysiology of disease, including 
ular contains the generic code that is generally recog- ni^plasia, or carcinogenic, teratologic or other cyto- 
nized as a universal language used m aU Uvmg systems ^^^^ Conventional methods already exist for 

and is divided into triplet sections, each section formed ^edicting activity based on the exisUng structures or 
from a sequent of three (3) ba^ of nucleic s^^ 30 ^jolecules of known biological activity. The present 
each section mnucncing the codmg for a specific ammo ^^^^ is based on the concept that the structures and 
acid In double-stranded nuctoc acu^ the bases are the related physiochcmical properties of all biologically 
paired, giving nse to a coil«l. double hehcal structure. mol«:i5es are reHected^ the structure of the 

Twenty years have passed since the discovery of the luuic^-uics wc lu mc a»ui,iu><^ ui u.c 

geneTc SdHhe ex^naure of the relatiSip be- 35 '^^S^'T^^ 
fween the sequences of three consecutive nuclwtide !« l-ght of the e^tabl^ hypothesis that the genehc 
basesknown«codonsandtheuniquegroupoftwenty "formation of the pioducUon and qhemoreception of 
L-amino acids involved in protcir? sySthcSs is. how- ^' oocurrmg biologically acUve rnolecules is 
ever. stUl uncertain. While there have been many de- «;DNA, In other words, th^e mustbe a^ blucpnnt in 
scriptions of physicochemical relationships between 40 DNA for the production and biological function of all 
amino acids and the purine and pyrimidine moieties of nattiral producis. ^ v , 

their codons, a satisfactory stereochemical explanation A^novdsetof' artificial constructs have been deyel- 
of the code remains to be established. Thus, how each of oped which are based in part upon tjie structure of 
the amino acids in a protein sequence came to be related DNA. RNA. and/or DNA-RNA complexes. The arti- 
to three (3) nucleic acid bases in a nucleic acid sequence 45 ficial constructs" have no known existence in nature but 
has not been elucidated. This sUte of affaris prompted provide means for a topological and physiochemical 
Crick to propose that the code might be a "frozen acci- understanding of the "blueprint" for biological activity 
dent" of the evolutionary process while nonetheless function. The "blueprmt" can be viewed in three 

advising that "it is therefore essential to pursue the dimensions as a series of lock and key fits of molecules 
stereochemical theory." Crick, F. H. (1968), /. MoL 50 "to the "artificial constructs". 
BioL, vol. 38, pages 367-379. I ' ^" essence, the double stranded helical structure of 

In' the seai^h for various stereochemical approaches DNA has been used to construct cavities which reflect 
to the genetic code, e.g., Hendry, L. B., et al., (1979), either artificial spaces created by removing a smgle base 
Persp. BioL Med. vol. 22, pages 333-345. it was discov- from the helix or artificial spaces created between base 
ered that many of the R groups of the twenty L-amino 53 pairs upon uncoUing the helix. Thus, depending upon 
acids are similar in structure to the purine (adenine and which nucleotides are in a given base sequence in DNA, 
guanine) and pyrimidine (thymine and cytosine)' bases several unique "artificial constructs" of these cavities 
of DNA. (FIG. lA). When the a-amino group of an can be made which have different shapes, sizes and 
amino acid is positioned at N-9 of a purine of N-1 of a physiochemical properties. The "artificial constructs" 
pyrimidine as shown in FIG. IB, the R group can as- 60 are descriptions of the properties of configurations in 
sume a conformation in which the atomic arrangements DNA which are not known to exist in nature, 
are Hke those of a purine or a pyrimidine is a comple- Molecules having varying types of biological activity 
mentary Watson-Crick base pair. Amino acids with will fit in a stereochemically complementary fashion 
hydrophilic moieties appear to be capable of forming into certain of the "artificial constructs." The structures 
complementary hydrogen bonding pairs with nucleic 65 of molecules with similar activity can be correlated 
acid bases which are analogous to those, in base pain. with their fits while, conversely, molecules which do 
Hydrophobic amino acids can, in many caases, form not form complementary fits into a given cavity will not 
complementary Van der Waals surfaces with one of the possess the respective biological activity. 
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The potential scope of the present invention involve 
predictions of biological, activity for: (1) naturally -oc- 
curring molecules whose, structures are luiown;;and (2) 
non-naturaliy occurring molecules that have been syn- 
thesized.. It is also possible to use the method to con- 

. ceive.of the structures and predict the activities of:,.(i) 
new synthetic moleculesti (2) meubohtes of new syn- 

. thetic molecules; (3) natural products whose.structures 
have not yet been elucidated; and (4) structures of me- 



. Referring to FIG. 2, (A) is a diagram of the triplet 
sequence' CAC — one of th^ 'cbdons^ for histidine— in 
tnRNA; (B).in:a diagram of double stirinded DNA with 
the corresponding codon and aiiticodon base sequences; 
the first, second and third bases are indicated by Roman 
numerals and. dnticodon bases by'?; (C) when the sec- 
ond base of the codoh is' removed from the CAC codon 
sequence, ,th resultant ."cavity" is designated by a 
shaded arei T/j^, the secqhd base of the anticodon, has 



peptide horaibncs 


androgens ' 


cucinogent 


neufouansmitten 


e$trogcits 


teratogens ■ 


adrenergics 


. glucocorticoids ■'. 


:F:io{iaphorei 


plant bonnottct 


cholecaldrerc^ 


^neurolepitics 


pberoinooes 


viiamins 


•fcrbicidw '■ 


hislamtnn 


■ anticoavukant* 


insectiddcs 


antihiuamines 


antibiotics 


cKdnical warfare 








aitttcstrogciu 


icdltivcl and 




hypnotics 





tabohtes of natural products whose Structures have not '10 been termed the "adjacent" base; (D) illustrates the 
been elucidated. Examples of the. types of .molecules insertion of L-histidine into the "cavity" formed by 
whose activities can be predictecj. are:,. removal of the second base of its codon CAC; and 

(E-H) show CPK models corresponding to the diagram 
above. 

15 The concept of pairing amino acids and nucleotide 
bases and the relative importance attached to the role of 
the second base position in the triplet code is supported 
by the established correlation of physiochemical prop- 
erties of amino acids with nucleotide b^ses, in particu- 
20 lar,.>yith the second anticodon base. Studies of protein 
structure and protein synthesis have also suggested that 
the second base is the most influential in detennining 
the chemical characteristics of the amino acid coded 
for, as well as limiting errors of translation. 

In order to pursue a specific stereochemical rationale 
for the genetic code further, it was necessary to include 
the various possible chemical interactions of amino acid 
R groups, and . nucleic acid bases in the model building. 
Spectroscopic studies has previously shown that amino 
;30. acid R, groups could interact directly with DNA and 
RNA .not only by forming hydrogen bonds .with the 
functional groups of the bases but also could intercalate 
TIGS. lA and B are structural analogies between and .fprm stereospecific hydrophobic stacking . com- 
L-amino acids and nucleic acid bases; ' ' between bases. When closer cx^maUon of the 

FIG. 2 Uiustrates the removal of the middle b^.from' 35 stereochemical and related physiochemical factors 



When the present invention is uses to predict the 
degree of biological activity of compoi^d; (natural or 25 
synthetic), the compounds are fitted into the cavities 
formed from unwinding the nticlcic aci.d and, ranked 
according to their, fit and complementary, hydrogen 
bonding. . -. ' . 

DESCRIPTION OF THE FIGURES OF THE 
DRAWINGS ^ .5 



a triplet base sequence in DNA and of the complemen- 
tary physiochemical. fit of an amino acid in the fdsulting 
cavity; " "'; 

FIG. 3 illustrates the construction of "cavities" in 
DNA; 

FIGS. 4A-C show two-dimensional profiled 'depil;t- 
ing the overlay of triplets of bases in double' stranded 
DNA after the middle base has been removed; 

FIG. 5 sets forth the complementary hydrogen bond 



which allowed the incorporation of amino acids into 
models of DNA with such ease and specificity, it be- 
^carne apparent that the stereochemical bfluenccs of the 
neighboring first and third bases of the codon (FIG. 2) 
40 were important to the molecular topography of the 
space in the B-DNA helix into which amino acids were 
. fitted and needed definition. 

The prediction method of the present invention ap- 
plies to all nucleic acids, including double-stranded 



pairing of amino acid R groups and adjacent nucleic 45 DNA, , double-stranded RNA and double-sUanded 



acid bases permitted in DNA "cavities"; 

FIG. 6 is an illustration of the fit of L'-histidine into 
the cavity formed from one of its codons (CAC); and 

FIG. 7 lists the stereochemical fit of L-amiiio acids 
into the cavities constructed from their codons. 

DETAILED DESCRIPTION OF THE 
ILLUSTRATIVE EMBODIMENTS 

I. "Artificial Constructs" of Stereochemical 



DNA-RNA complexes. Models based upon B-DNA 
rather than other forms of DNA or RNA were em- 
ployed, because the X-ray structure of B-DNA . is 
known. For instance, mRNA-tRNA complexes could 
50 be utilized which might be of more direct relevance to 
the code but would also allow the 2' hydoxyl group of 
ribose to form a covalent linkage to the carboxy group 
of an amino acid which is the linkage known to exist in 
tRNA, providing potentially a more appropriate anchor 



Complementary "Cavities" Made by Removal of a Base 3i for the amino acid in the cavity. The method of this 
in DNA invention would proceed slightly different with RNA 

and DNA-RNA complexes than with DNA. This stems 
A. Theory of "Cavities" from the fact that the stereochembtry of the two cate- 

Using Corey-Pauling-Koltun (CPK) space-filling gories is slightl y different. Whereas the B form of DNA 
molecular models, Kcndrew models and the National 60 has a unifonn, defined X-ray striicture which is identi- 
Institute of Health (NIH) X-ray computer graphics cal from one strand to another. RNA and DNA-RNA 
system, it was found that amino acids couId.be incorpo- complexes have many different forms and only some of 
rated with apparent stereocheniical specificity into . these have been defined in terms of their X-ray stnic- 
DNA without disrupting the double helix. This was ture. In othejr^ .words, DNA's structure is an absolute 
especially evident when the amino acitls were placed in 65 whereas RNA.and^ DNA-RNA complexes arc relative, 
cavities created by removal of the second base of their Therefore,, it '\s noi feasible to describe the structure of 
codons and were thus paired with the.adjacent second RNA or, DNfA-RNA complexes in general; they can 
base of then- anticodons (see FIG. .2). only be described as individual structures. Nevertheless, 
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for any given- RNA or DNA-RNA complex whose 
X-ray structure is known, the same steps can be fol- 
lowed as for DNA and the prediction method and re- 
sults will be still be valid. 

Refmcments in the structure of the B form of DNA 3 
remain to be investigated, such as variations in the angle 
of twist of bases about the helix axis which have re- 
cently become evident Certain oonformations of DNA, 
such as zDNA, would, of course, yield a very different 
series of cavities and potentially different stercochemi- 10 
cal interactions with small molecules. Cavities have 
been constructed using the pyrimidine uracil which 
occurs in RNA in place of thymine; while the pattern of 
fits of the amino acids into the resultant cavities is 
slightly different, it api>ears to correlate just as well IS 
with the genetic code. 

Scale models of the space formed by the removal of a 
base from CPK models of DNA were constructed 
(FIG. 2). Each of the resulting sixty-four possible 
spaces are simply - called "cavities" for the lack of a 20 
bietter term and possess unique physiochemical and 
topological features. The "cavities" present evidence 
for: (1) a stereochemical correlation between individual 
DNA cavities which is consistent with the general pat- 
tern of the genetic code; (2) a stereochemical f\t (in 2S 
some cases apparently "lock and key") of the amino 
acids into certain specific cavities; and (3) a remarkable 
correlation of the fits of the amino acids in cavities to 
the genetic code. 

While the present approach is derived from conven- 30 
tional physiochemical relationships between bas^ pairs 
in DNA (including hydrogen bonding and stacking 
interactions) as well as from steric constraints related to 
the known position of the bases in a B-DNA double 
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removed and used to recreate cavities as negative im- 
ages in silastic RTV polymer (Dow Coming) (Fig. 3). 
Detailed description of the technical aspects of con- 
struction can be fotmd elsewhere. Kirbo, L. (1981), 
Masters thesis, Medical College of Georgia. 

FIGS. 3 A-J illustrate the construction of a cavity 
model in DNA: 

(A) Cytosine(C)-guanine(G) photpgraphic profiles of 
CPK models for DNA base pairs; 

(B) ademnc(A)-thymine(T); 

(C) a two dimensional overlay of profiles of base 
pairs: in 3' to 3', direction the double stranded 
DNA codon sequence CAC on the left, and 3' to 5' 
inticodon sequence GTG on the right; 

(D) the same profUes with second codon base A re- 
moved; 

(E) three dimensional form used to define the "cav- 
ity" formed by the removal of A from CAC as in 
FIG. 3B above; the boundaries of the cavity are 
defmed by the surface of the adjacent bae, here 
Til* and the shortest vertical lines connecting the 
profiles of the base pairs I and III; 

(F) a wax mold is made of the cavity shown in FIG. 
3E; 

(G) the third base pair (C///-G///*) has-been removed; 

(H) positive wax mold of the cavity; 

(I) negative image of the "cavity" in RTV polymer 
made from wax positive shown in FIG. 3H repre- 
senting the cavity derived from CAC codon; and 

(J) fit of a CPK model of L-histtdine in the cavity 

foiroed froih the CAC codon. 
In this first approximation of the cavities, the 
"knobby" features of the neighboring bases (i.e.. the 
surfaces of orbitals), which would be shown in really 



helix, the cavities are no more than artificial constructs. 3S accurate maps of their topology have been ignored. The 



Thus, it is not suggested that the cavities are real, that 
they are germane to the evolution of the code, or that 
they are involved in the mechanism of transctption or of 
translation. It is believed, however, that the cavities- 
provide new evidence that there may be a stereochemi- 40 
cal rationale for the genetic code. 

B. Construction of the "Cavities" 

Corey-Pauling-Koltun models of the B form of DNA 



less detailed models greatly simplify comparison of the 
cavities to one another. Examination of models which 
include the entire surface indicates that the topology of 
the cavities is not markedly alTected by these features. 

C Other Methods to Construct "Cavities" 

There are a number of alternative ways to construct 
the cavities described above. For example, instread of 
using CPK models of DNA, the absolute X-ray space 



were made which approximate the published X-ray 45 filling coordinates of DNA (from the NIH Computer 



coordinates of the NIH computer graphics system. A 
single base was then removed without disrupting the 
remainder of the DNA structure (see FIG. 2). To sim- 
plify the modeling of the resulting cavity, the DNA was 
reconstructed without the sugar-phosphate backbone, 30 
with flat profiles of the neighboring 3' and 3' bases (the 
first 00 and the third (III) bases, respectively). The 
"adjacent" second base (II*) is used as a spacer (see 
FIG. 3). The dyad axis serves as a convenient reference 



Graphics Center) have been used to construct* the same 
cavities (to the scale of CPK models) with the exception 
that the positive images were constructed from "Oasis," 
a commercially available material used by florists. 

D. Relationship of the "Cavities" to One Another and 
to the Pattern of the Genetic Code 

In the genetic code shown in FIG: 7, an amino acid 
can be coded for by as few as one or as many as six 



point to fix the positions of the bases accurately. Be- 55 triplet codons in the 5' to 3' direction. Most of the sixty- 



cause of the symmetrica] properties of B-DNA. the 
relative position of the deoxyribose-phosphate back- 
bone does not vary significantly with base sequence; the 
contribution of the backbone can therefore be consid- 
ered to be constant for each cavity. 

To defme topology, the cavities are filled with wax, 
contouring the surfaces by connecting the closest points 
on the outside perimeter of the profiles of the neighbor- 
ing 5' by 3' base pairs. The positive wax image of a. 



four codons are redundant with respect to the third 
base. If an amino acid has two codons, the third base is 
always either a purine or a pyrimidine. If an amino acid 
has four codons, there is redundancy only in the third 
60 base position. 

Further inspection of the codon catalogue reveals 
that all amino acids having T (U in RNA) as the second 
codon base (phenylalanine, leucine, isoteucine, methio- 
nine and valine) are hydrophobic; conversley, all amino 
cavity is thus an approximation of the overlap of the 63 acids with A as the second codon base possess hydrogen 
surfaces of atoniic orbitals between the "adjacent" mid- ■ bonding R groups and are hydrophilic. Some amino 
die base of one strand and the neighboring 5' "and. 3' afcids which are closely related in chemical structure 
bases of the opposite strand. The wax figures' are then have 'codons which differ only in a single base, e.g., 
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tionship of cavities to the code is even more compelling 
when cavities constructed from two of the three stop 
Dodons (TAA/TAG) are examined. They have topo- 
logical and physiochemical properties which are 
closely related to each other but are distinct from the 
other sixty-two. The third stop codon, TGA, which 
occasionally codes for tryptophan has a similar shape to 
TAA and TAG but differs in hydrogen bonding charac- 
ter due to the difference in the second base (G versus 



E. Utility of the "Cavities" 



asparagine (AAC,AAT)— aspartic acid (GAC,GAT); 
glutamine (CAA.CAG)— glutamic acid (GAA,aAG); 
phenylalanine (TTCTTT)— tyrosine (TAT.TAC); val- 
ine (GTX)— isoleucine (ATC,ATr,ATA). There are 
two codons, related by a single base change, which 
never code for any amino acids: TAA and TAG. These 
codons and TGA, which in some instances codes for 
tryptophan, are "stop" codons which provide signals 
for the termination of translation. 

Each of the sixty-four cavities constructed from the 10 A).' 
sixty-four possible triplet sequences in DNA has a 

unique size, shape and set of physiochemical features. In . ^ ^ j . 

FIGS 4A-C the effects of changing a base on one The cavities constructed above have been used to fit 

cavity formed from the codon sequence CAC (the mid- Corey-PauUng-Koltum models of L-ammo acids. All of 

die base A was removed) are illustrated by the two 15 the L-amino acids used for the synthesis of protein 

dimensional overlap of base profiles and by the shape of which possess R groups fit specifically mto caviUes 

"center slices" of the cavities— silhouettes drawn from which are formed by removal of the second base of 

a slice parallel to the base pairs through the middle of their codons. Fits of the CPK models of each of the 

each cavity. twenty L-amino acids into DNA were demonstrated as 

Regardless of the particular cavity being studied, 20 follows. The a-amino group of the amino acid was 

changes in the 3' or third base generally affect the size, placed in a position in. the cavity where the missmg 

shape and physiochemical properties least because of purine or pyrimidine would have been attached to dc- 

the right handed hcUcal coiling of DNA [i.e., less of the oxyribose (at N-9 or N- 1 , respecuvely). The possible 

third base overlaps the cavity (FIGS. 4A-C)]. Due to conformation of the amino acid models to be considered 

their structural similarities, a change in the third base 25 as fits were subject only to three constraints: 

from pyrimidine to pyrimidine (CAC/CAT) or purine . 1. Amino acids having R groups with hydrogen 

to purine (CAA/CAG) affects the cavity less than a bonding hetcroatoms were placed so that they 

purine-pyrimidine transition (e.g., CAT/CAA). would form complementary hydrogen bonds to the 

Changes in the middle base position (e.g., CAC/CCC) adjacent base. The hydrogen bond between the R 

have little impact on the overall shape; those changes 30 group and the adjacent base was considered com 



obviously have a substantial effect on the volume of 
cavities but can alter their shapes considerably. 

The features of the cavities which are revMled by 
changing bases reflect the pattern of the genetic code. 
The minimal effect of changes in the third base on cavi- 35 
ties, in particular purine to purine or pyrimidine to 
pyrimidine transitions, in comparison to changes in the 
first or second base, is consistent with the overall redun- 
dancy of the third base in the genetic code. For exam- 
ple, the cavities constructed from GAG and CAA are 40 
almost identical to each other; both of these are codons 
for L-glutaminc. The cavities constructed from codons 
CAC and CAT code for the same amino acid L-histi- 
dine, and are different from those associated with L- 
glutamine. Changing the first or second bases of CAC, 45 
one of the L-histidine codons, gives rise to very differ- 
ent cavities. For example, AAC and CGC code for 
L-lysine and L-arginine, respectively. 

The importance- of the second base to the physio- 
chemical characteristics of the cavities is consistent 50 
with the physiochemical grouping of amino acids and 
the second base of their codons. All amino acids having 
codons with the second base A have hydrophilic R 
groups; the cavities created from their codons arc all 
related by the same adjacent base (T). Conversely, cavi- 55 
ties derived from all codons with the second base T 
have the adjacent base A and code for hydrophobic 
amino acids. The first codon base which has the greatest 
effect on the shape of the cavity also seems to be of 
some significance to the pattern of the code. For exam- 60 
pie, some of the structurally related amino acids [those 
with phenyl rings (phenyalanine-TTC/TTT; tyrosine- 
TAC/TAT) and chiral R groups (threonine- 
ACC/ACT/ACA/ACG; isoleucinc-ATC/ATT- 
/ATA)J, have codons which have the same first base 65 
but differ in the second base. The structures of these 
amino acids and the cavities associated with their co- 
dons have very similar shapes (not shown). The rela- 



plementary if the direction of the bond was not 
more than ten degrees out of the plane defined by 
the hydrogen bonding angle of the adjacent base. 
In all cases, with the exception of L-lysine and 
L-lyrosine (which form hydrogen bonds to 0-4 of 
the adjacent base T), the amino acids were hydro- 
gen bonded to either N-1 of the adjacent purine or 
N-3 of the adjacent pyrimidine. FIG. 5 lists the 
complementary hydrogen bonding groups of the 
hydrophilic amino acid R groups and of the adja- 
cent bases. Because the .^directionality of hydrogen 
bonding is known to be an important criterion for 
the strength of chemical interaction between bio- 
logical molecules, as exemplified by Watson-Crick 
base pairs themselves, this appears to be a reason- 
able criterion for fit. 

2. AU amino acid R groups were required to come in 
contact with the adjacent base. This seems a rea- 
sonable requirement inasmuch as physiochemical 
interactions between hydrophobic surfaces require 
contact within Van der Waals radii. 

3. To partially define the steric limitations imposed by 
the cavities, it has been assumed that any confor- 
mation of an amino acid which extends beyond the 
boundaries of a cavity cannot fit into it. For present 
purposes, a non-fit is defined as having approxi- 
mately ten percent or more of the volume of the R 
group protruding out of the cavity. An example of 
the fit of histidine into its codon cavity formed 
from CAC is shown in FIG. 6, with A illustrating 
the cavity formed from CAC (cf., FIG. 3). the 
cavity with a CPK model of L-histidine inserted 
(B) and C showing the cavity with histidine in- 
serted showing the potential for hydrogen bonding 
with the adjacent second base thymine, an arrow 
indicating the location of the hydrogen bond. 

There has not, of yet, been an attempt made to specify 
the exact position of the a-carboxylate group of an 
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amino acid in a cavity. Preliminary observadom of the 
fits of CPK models of each amino acid into all sixty-four 
cavities using CPK models of the entire DNA helix 
indicate that the carboxy group can extend in the direc- 
tion of the minor groove of DNA. Using CPK models 
and the NIH X-ray computer graphics system, it has 
also been found that the 2-hydroxy group of tibose in 
double-stranded RNA could serve as a connector to an 
amino acid through an ester linkage (as in t-RNA); the 



Fits of the nucleic acid bases into the cavities are indi- 
cated by stippled patterns. 

These preliminary stereochemical fits are clearly 
consistent with the over^l. pattern of cbdon assign- 
ments in the genetic code .which has been based on 
experiments involving the | translation of amiiio acids 
from synthetic oligonucleotide templates into oligopep- 
tides. While some of the alternate fits in FIG. 6 do not 
reflect directly upon the. codon . catalogue, measure- 



fits of amino acids were similar to those observed using mcnts of the energetics and development of a more 
DNA cavities. accurate three-dimensional topology for defining the 

Using these criteria, the amino acids fit rem^kably cavities should result in more rigorous criteria and, 
well into cavities fonned from thdr codons. (Sec FIG. therefore, in specificity. The stcrcochcmi(»l precision 
6) The considerable specificity exhibited by the fits of °f cavities might also be iinproved if it ever becomes 
the amino adds into the cavtUes constructed from their P<»»it>le to construct cavities based on the tertiary struc- 
codons are summarized bdow in no. 7: tureofRNA. . 

(A) All nineteen L-amino adds with R groups fit into ^y*^ only exception to these hopeful 
one or more of the sUty-four possible cavities. generalizations ts glycme which lacte an R group and 

(B) All nineteen amino adds fit the cavities assodated _ "O* At mto any cavity by the above cntena. The 

20 nonfit of glycme into the cavities denyed from Its co- 
dons may be related in some way to the reasonably 
good fit of guanine into these cavities; guanine is known 
to have strong stacking interactions with other guanine 
bases in the DNA double helix and can form three hy- 

25 drogen bonds when paired with cytosine. 

Specific Observations of the Qualitative Fits of 



with their codons, (uc, the cavity , fonned by re- 
moval of the second codon base). 

(C) Amino acids fit into cavities assodated with their 
codons in the 5' to 3' direction and not generally in 
the 3' to y direction (obvious exceptions would be 
symmetrical codons, such as CTC. etc.). This is 
consistent with the genetic code. 

(D) All amino adds having.hydrophilic R groups can 
form complementary hydrogen bonding pairs with 
the "adjacent" or second base of their anticodons. 

(E) When one of the nineteen amino acids cannot be 
fitted into a specific cavity, the situation with ap- 
proximately eight percent of the possible 1,280 
combinations, the cavity into which they cannot fit 

is not one associated with any of the codons for the 33 
amino acid. 

(F) When an amino acid fits into a cavity not assod- 
ated with any of its codons, the codon from which 
that cavity is derived often differs in only a single 
base. When this occurs, the cavities usually appear 40 
to be stereochemically related, and in many cases 
are assodated with structurally rdated amino acids 
(e.g., tyrosine-phenylalanine; valine-isoleudne; - 
isoleucine-leucin^ asparagine-aspartic acid; gluta-. 
mine-glutamic add; threonine-serine; threonine- 45 
isoleucine). 

(0) Cavities derived from two of three stop codons, 
TAA and TAG, have unusual topologies which do 
not accomodate any amino add. 

(H) Glycine, which does not possess an R group, does 
not fit into any cavity induding those fonned by 
removal of the second base of its codons (GGX); 

(1) In general; nucluc add bases are surprisingly poor 
fits for many of the cavities derived from models of 
DNA; a notable exception is guanine which ap- S3 
pears to fit the cavities fonned by removal of the 
second base of the glycine codon, GGX. 

FIG. 7 graphically illustrates the preliminary stereo- 
chemical fits of L-aminb odds into the sixty-four possi- 
ble "cavities" formed by removal of the huddle base (II) 60 
of double-stranded triplets m DNA, as in FIGS. 2-4. 
Dark shaded areas indicate that the atniiio add fits into 
"cavities" formed from their codons;.. cross-hatched ' 
lines indicate alternate -fits of amino acids into cavities 
which arc not directly associated with their codons. 63 
Unshaded areas indicate that that amino , add does not 
fit into the "cavity." In all cases except glycine (GGX), 
the amino acids fit cavities formed from- thdr 1 codons. 



Amino Adds on Cavities 

L-Proltne — fits very well into cavities formed from 
its four codons (CCX). Proline appears to form 
complementary hydrophobic contacts with the 
. adjacent anticodon second base and the R group of 
the amino acid cannot come in contact with the 
surfaces of the adjacent base in cavities having C, 
. T, or A as the second anticodon base. 
L-Serine--depending upon its conformation to the 
adjacent base, can form complementary hydrogen 
bonds with dthcr G or C, the adjacent bases within 
its codon cavities. This condition is met in cavities 
derived from all six of its codons TCX, AGT and 
AGC. Because of the relatively small size of its R 
group, serine can fit a number of cavities but con- 
forms most closely to those formed by four of hs 
six codons (TCX). 
L-Threonine— like serine, can form complementary 
hydrogen bonds .with dther O (its "adjacent" 
codon base) or C It fits into many of the sixty-four 
- cavities as does serine but cavities derived from its 
four codons ACX (cf. isoleucine) provide snug fits 
(almost Ipck-and-key) of the chiral R group, 
L-Alanine — is the smallest amino acid containing an 
R group (CH3). Its four codons GCX form very 
small cavities into which the methyl group of the 
amino, add fits tightiy. Alanine fits very loosely 
into cavities in which the second anticodon adja- 
cent base is C, T, or A and, therefore, cannot form 
a complementary surface with those bases. 
.L-Leucincr-can be fitted quite well into all six of its 
codon cavities; codons TTA and TTG are rela- 
tively tight fits, however, and less favorable than 
CTX. 

. . L-Phenylalanine — was one of the amino adds which 
has no dearly discernible stereochemical analogy 
. to nudeic add bases until the abovcndescribed 
: cavities were construct ed. I t fits into cavities 
formed from its codons TTT and TTC; the aro- 
matic phenyl ring can stack neatly between the first 
and third bases which border the cavities (both are 
pyrimidines: T and T or T and C). Phenylalanine 
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also fits TAT and TAC, the codond for the struc- 
turally and metabolically related amino acid jtyroH 
sine. Since TAT and TAG form largercavitiey than 
TTT and TTC; phenylalanine fits the tyrosine cav- 
ities more loosely than the larger' tyrosine "molc' 5 
cule. Conversely/' tyrosine will not fit TTT -or 
TTC, because it cannot form appropriate hydrogeti 
bonds with the "adjacent" anticodon base A.'- ' 

L-Isoleucine-^posscsscs a chiral R group due to a 
methyl branch at C-3 which has the same stereo- 10 
chemistry as the methyl group of threonine; When 
fitted into the cavities derived from its codons 
(ATT, ATC, ATA), isoleucine is very similar in 
conformation to threonine fitted into cavitie^' de- 
rived from its codons, ACX. Both amino actd:s 15 
possess the same first base (A) in their codons; thus, 
the corresponding cavities also have similar Shapes. 
Isoleucine fits particularly well into cavities' de- 
rived from two of its three codons, ATC and ATT. 

L-Methinonine— is a relatively bulky amino '^cid 20 
which fits a number of cavities including the one 
derived from its codon ATG. into which It ffls 
tightly. lu analog in RNA, AUG, is the "start" or 
initiation codon for protein synthesis. While ihethi- 
onine can be considered as forming a hydrogen 15 
bond between its sulfur atom and the 6-NH2 of 
A— the adjacent bases in its -cavity, it cannot " be 
stated that methionine specifically fits into only one 
cavity. 

L- Valine— -is structurally related to isoleucine, hav- 30 
ing a methyl group at C-3; valine fits into 'cavities 
derived both from its own codOns and th6se for 
isoleucine. 

L-Histidine— can form. a complementary hydrogen 
bond only when the adjacent base to the cavity is 35 
T; histidine fits closely into the -cavities derived 
from its two codons CAC and CAT (see FIG. 6). 
While cavities generated from CAA and CAG 
which are codons for glutamine are listed (FIG. 7) 
as potential alternate fits for histidine. a portion of 40 
the histidine molecule will protrude from those 
cavities. 

L-Glutaraine— fits very well into the cavities derived 
from its two codons CAA and CAG. Glutamine 
can form two complementary hydrogen bonds to 45 
the adjacent base T, at 0-4 and N-3. 

L-Tyrosinc— fits the cavities formed by its codons 
TAT and TAC uniquely. A hydrogen bond can be 
formed between the hydroxyl group of the amino 
acid and the adjacent base T at 0-4. The aromatic 50 
ring of tyrosine can also stack neatly in an energeti- 
cally favorable position between the aromatic rings 
of the neighboring pyrimidines, the first and third 
codon bases bordering the cavity. This stacking is 
similar to that observed when phenylalanine is S3 
fitted into the cavities derived from its codons. 

L-Asparagine— fits reasonably well into the cavities 
derived from its codons AAC and AAT. It can 
form a complementary hydrogen bond to the adja- 
cent base T at N-3. ComplemenUry hydrogen 60 
bonding to another adjacent base is not feasible. 
Asparagine can also fit into the similarly t, shaped 
cavities for the structurally related amino acid 
aspartic acid. 

L-Lysine^fits into its codon-anticodon cavities so 65 
that there is a hydrogen bond between the €-NH3 
moiety and the CM of T, the adjacent base. Lysine 
also fits, albeit poorly, into some cavities in which 



C is the adjacent base" by hydrogen bonding at 0-2 
ofC - " ' . 

L-Aspartic Acid— is an excellent fit for its cavities 
G AC and GAT ^yhe^e a hydrogen bond can be 
formed between a carboxylate oxygen and N-3 of 
T, the adjacent base^ Aspartic acid can also fit into 
one of the cavities derived from the asparagine 
codons, AAT, as well as cavities derived from the 
histidine codons, CAC and CAT, which are larger. 
, L-Glutamine Acid— fits into the cavities formed from 
its codons GAA and GAG, forming a hydrogen 
bond between a carboxylate oxygen and N-3 of T. 
Glutamic acid also fits CAA and CAG, codons for 
the structurally related amino acid glutamine. 

L-Arginine-^has a large R group in comparison to 
other amino acids and fits the relatively large cavi- 
ties derived from all six of its codons CGX, AG A 
and AGG. The giianido group of arginine can 
mimic guanine in forming two complementary 
hydrogen bonds with the adjacent base C at N-3 
and 0-2. The conformation of the arginine side 
chain- when fit into cavities derived from CGX is 

. different from the conformation in AGA and AGG 
where there is a tight fit. There does not appear to 
be a unique stereochemical fit of arginine to cavi- 
ties associated only with its codons (e.g., arginine 
also fits into the cavity derived from AGC, a codon 
for serine). 

' L-Cystcinc — is a relatively small amino acid, but like 
serine has a conformation which will permit a hy- 
drogen bond to N-3 of the adjacent base C in the 
relatively large cavities formed by its codons, TGT 
and TGC. However, very little topological speci- 
ficity is suggested by its fits. 

L-Tryptophan— possesses a relatively large indole 
ring which is too large to fit into most of the cavi- 
ties; it is a tight fit for the cavity formed by its 
codon TGG. Tryptophan also fits into the cavity 
derived from TGA which can also code for trypto- 
phan in yeast and in human mitochondria. TGA is 
a stop codon in pyrocaryotic cells. The indole-NH 
can form a hydrogen bond with cavities having the 
adjacent base C at N-3 which are also the largest 
cavities. The first base of the codon T provides a 
surface large enough for the top of the cavity to 
accommodate the aromatic rings stacked between 
T and G (the third base bordering the cavity). 

Glycine — is the only amino acid which does not pos- 
sess an R group or side chain and hence does not fit 
any cavities by the above-described criteria. Rela- 

' tively large cavities are formed from its codons, . 
GGX. The nucleic acid base guanine is a relatively 
good fit for these cavities. 

STOP. Codons— cavities derived from TAG, TAA, 
and TGA have ■ generally oblique shapes with 
unique skewed helical tqpplpgies. The. TAA and 
TAG cavities in particular are poor fits for all 
amino acids (see FIGS. 4B and 4C).,. 

Nucleic Acid Bases— fit into cavities poorly with a 
few exceptions; for example, guanine can 'fit jintp 
, the codons for glycine, GGX. In most; cases, the 
- 1/ famino acids are better ifits for their cavities, .than 

- . any, of the bases which .were removed to fopn the 
cavities: themselves. Obviously, it would.be prcma- 
■ ture tQ.attach too- much significance to this obscr- 
. . vation^ until proper quantitative data are available 
r- . to describe, the relative energetics of fits, 
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mcr'iTccfrtw °f changes on the physiocheriiical properties and 
uiat^usMUN topology of the caviUes is also II>l>>ni. 
The topological and physiochemical relationship There are some weaknesses in the above discussion, 
between cavities forms from codon-anticodon sequen- Perhaps the most significant is that there may be a bias 
ces in double^tranded B-DNA, the stereochemical fits 5 inherent in a prior knowledge of the existence of the 
ofamino acids into the cavities, and the physiochemical genetic code. It would have been preferable, albeit 
complementary of amino acids to the sp«;ific cavities unrealistic, to have first established the genetic code via 
associated with their codons are consistent with the a stereochemical rationale and then to have sub- 
pattern of the genetic code. However, it should always sequenlty proven its existence in the laboratory with 
be understood that the cavities do not in themselves 10 synthetic oligonucleotides. The above stereochemical 
represent the stereochemical logic of the double-helical argument is based upon cavities which are, so far, un- 
slructure of B-DNA. The remarkable complementary tested artificial cotistructs. Their topologies can at pres- 
lock-and-key fits of several of the amino acids into cavi- ent be considered only rough first approximations. In 
ties formed from their codons along with the obvious this regard, while continuing studies suggest that the 
inability of many alternative amino acid structures to fit 15 cavities in DNA may be helpful in understanding the 
any DNA cavities (e.g., aromatic amino acid structures structural, metabolic and mutagenic relationships bc- 
which have one or more methylene groups added to tween amino acids, there is no evidence that cavities 
their side chains would be unable to fit into any cavity) have ever existed in the evolution of the code or that 
suggest that stereochemical feeltures of the genetic code they have any relevance to the processes of transcrip- 
may be related to constraints on the number and struc- 20 tion or translation. 

ture of the amino acids used for the biosynthesis of X-ray diffraction data is currently being utilized to 

proteins. construct computerized models of the cavities which 

These preliminary fmdings may be interpreted as can be used for calculations of the energetics involved 

strong evidence that the genetic code has a stereochem- in the fit of amino acids. Several factors will have to be 

ical basis, whether or not it is precisely the one as pro- 23 taken into account if the relative degrees of fit of amino 

posed above. Although the models of cavities of the acids into a cavity are to be rigorously compared: 

present invention represent a new stereochemical ap- (1) the ability of hydrophilic amino acids to form 

proach to the genetic code, there ts a large body of prior complementary hydrogen bonds (for appropriate 

work which can be considered as supportive. Jungck, distance and angle analogous to Watson-Crick base 

for example, found when he made an exhaustive exarai- 30 pairing) to the adjacent base in the cavity; 

nation of the physical properties of amino acids and (2) the degree of overlap of amino acid R groups with 

nucleic acid bases that the polarity, bulkiness and spe- the neighboring first and third bases in the cavity 

cific volume of the amino acids (precisely the attributes (i.e., the stacking of aromatic rings of amino acid R 

important for fitting into cavities) could be correlated groups with the aromatic rings of the first and third 

with the code, J. Mot EvoL. vol. 11, pages 211-224 35 bases); and 

(1 978) ; experiments by Weber and Lacey, / MoL EvoL. (3) the ability of an R group of an amino acid to fill 
vol. 11, pages 199-210 (1978); and Nagyvary and Fen- the cavity and form a complementary surface 
dler, Origins of Life, vol. 5, pages 357-362 (1974), using contact with the adjacent base, 
chromatographic techniques have shown that the polar- Although the stereochemical relationships between 
ity of an amino acid can be correlated with that of the 40 the cavities and the respective fits of CPK models of the 
second base of its anticodon, suggesting that amino amino acids have not yet been fully evaluated by com- 
acids might form specific complexes with nucleic acid puter methods, it has been possible to show that the 
bases. Wolfenden ct al.. Science, vol. 206, pages 575-577 complexes of amino acids and nucleic acids can be con- 

(1979) , have also demonstrated that the relative hydra- structed directly from X-ray coordinates using the NIH 
tion potentials of the amino acids can be correlated with 45 X-ray computer graphics system. As an example, the fit 
the second base in the code. of L-listidine into the B form of DNA in place of A in 

The importance of the second base position in deter- the sequence CAC, a codon for histidine, with the entire 
mining the properties of the amino acids has also been phosphate-deoxyribose backbone attached, has been 
emphasized in a computer generated code by Alf-Stein- shown in a computer-generated X-ray space filling pho- 
berger, Proc. Nad Acad, ScL. USA, vol. 64, pages SO tograph. Computer-generated complexes of amino acids 
584-591 (1969), and by studies of protem structure by associated with their codons are consistent with the 
Dickerson, J. Mol Biol. vol. 57, pages 1-15 (1971); stercochemically complementary fits into the cavities, 
Biochem. Biophys. Acta. vol. 119, pages 421-424 (1966); - confirming that certain amino acids side chains are 
ttaA2.hAznov,Doklady AcademiiNauk USSR, vol. IM, "lock-and-key" fits into DNA cavities. Some side 
pages 436-457 (1974). Wocse, who has long been a 53 chains are slightly bulkier than the bases originally re- 
proponent of a stereochemical rationale for the code, moved from DNA; the -carboxy groups are generally 
also proposed that base-amino acid pairing played a role oriented in the 5' direction in the minor groove— an 
in the shaping of the code. Biochem, Biophys, Comm.,. orientation which would permit a covalent linkage to 
vol. 5, page 88-93 (1961); Proc Nad Acad. ScL USA. the 2' hydroxy of ribose in double-stranded RNA or in 
vol. 54, pages 71-75 and 1546-1552 (1965) and vol. 59, 60 a RNA-DNA complex. 

pages 110-1 17 (1968); The Genetic Code: JTie Molecular It has previously shown with models that some amino 

Basis for Genetic Expression, (Harper and Row, New acids may be cap^le of intercalation between the first 

York) (1967); Bioscience, vol. 20, pages 471-485 (1970); two bases of their codons in double-stranded nucleic 

and Naturewissenchafen. vol. 60, pages 447-459 (1973). adds (DNA or RNA), allowing for shifting and pairing 

He also notes the strong correlations between .the sec- 65 of the adjacent (second) base of the anticodon with the 

ond base and the amino acid coded for in that errors in amino acid R group; Hendry, et al., Persp. Biol Med.. 

translation occur wath frequency of 1:10:100 in the IM- vol. 22, page 333-345 (1979). Subsequent removal of the 

:III bases of the codon, respectively. The relative effects entire second codon nucleotide would result in cavities 
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simUar to those described above, albeit with a different above, it is likely to have the biological activity associ- 
configuration of the sugar-phosphate backbone. The ated with the "parent" molecule, 
interaction of amino acid R groups in such cavities A further utility of the present invention is the design 
would be the same whereas the positions of the a- of new biologically active compounds and their metab- 
NHj+ and the carboxylate moieties would be different. 3 elites. The cavities can be used to design new com- 
If evidence that such intercalation occurs is forthcom- pounds having a specific function by constructing rood- 
ing, it is suspected that the stereochemical approach els of various structures which will fit into a given cav- 
described herein will be of value. ity and possess the same hydrogen bonding points as the 

The fits of amiiio acids to the "artificial constructs" parent- molecule. Such molecules. can be made from a 
derived from their codons can also be used to correlate lo range of elements and need not contain carbon. For, 
structures of amino acid derivatives and their metabo- example, silicon might be used instead of carbon, 
lites and biological activity. Amino acids are utilized as Additional uses of the cavities of the present inven- 
precursors for the biosynthesis of other biologically ^jon include the prediction of the metabolism of natural 
active molecules. Thus, a molecule that is metabolically aj^j synthetic products; prediction of the carcinogenic, 
derived for, and is thus structurally related to, a molecu- 15 teratogenic and general mutagenic potential of natural 
lar model of a given amino acid generally can be fitted products, existing synthetic products and new products 
into the cavity derived from the codon for that amino designed using the cavities; prediction of which specific 
acid. Examples are the sympathomimetic aminesepi- mutetions (changes in DNA base sequence) will result 
nephrine, norepinephrine and dopamine which are de- diminished or altered biological activity of proteins 
rived from the amino acid tryosinc. These and other 20 or polypeptides; potential thereapy for correction of 
sympathomimetic drugs have two carbon atoms which genetic diseases as, for example, the correction of muta- 
separate an aromatic ring from an amino group and jj^^^ j„ structural gene for hemoglobin resulting in 
which are required for activity. These two carbon ^^j^j^ anemia; design of new proteins and peptides 
atoms are similarly required to fit these pharmaceutical p^j. specific biological functions; prediction and manipu- 
agents into the TTC or TTT codon cavities for tyro- 25 lationof tertiary structure of proteins and peptides (e.g., 
sine. Hydroxy! substituents at the 3 and 4 positions on structures of enzymatic sites, receptors, antigens, etc.); 
the ring, which are common features of highly active prediction of potential therapy for metabolic and endo- 
sympathomimetics, facilitate hydrogen bonding within disorders; design and function of new matagenic 

the cavities. Structures possessing a saturated cyclo-. agents and their antidotes; regulation of bacterial and 
hexyl ring moiety (propylhexedrine) in place of phenyl 30 ^^^^^ systems to produce new compounds or to 
also fit into the TTC and TTC cavities and are known produce more (or less) of existing compounds; develop- 
to be biologically active. ^g„, g testing program using the cavities to replace 

Differentiation of the various classes of sympathomi- existing animal testing; and as educational models, in- 
roetic acUvity with regard to chnical activity (i.e., eluding toys and puzzles. 

^2; o-receptor activity and CNS activity) also appears 35 

possible usuig the cavities. For example: II. "Artificial Constructs" of Stereochemical 

Complementary Cavities Constructed by Unwinding 
DNA 
A. Construction 

Cavities can be constructed between any sequence of 
base pairs upon unwinding right handed double helical 
nucleic acids, particularly DNA. Using the same degree 
of unwinding of the helix for any given sequence, there 
" 45 ,are only ten cavities formed; each cavity has a unique 
The structures of many antagonistic of sympathomi- size, shape and set of physiochcmical features^ 
meUc ITes can be pr^edicted by assuming that an The ten cavities were construc^d usjng method^^^^ 
rtago^t must have s^ial structural featuL which similar to that previously descnbed tj«ve regardmg the 
m The cality as well asVnoieties which either do not fit formation of the base-pair ""^^^^^ "'^"f , ^^''^^^^ 
into the caWty or when fitted would distort the cavity. 50 images of the space between neighbonng base pmn 
For example, the j3-adrenergic block agent dichloroiso- were constructed from space filhng P^^^f "^Ifnurr 
proteranol (an antagonist) fi*ts part of the tyrosine-TAT pairs utdizmg ^'^V ^"^^[^maf ^.^h^Sz'e of 
Jodon cavity in that it possess a phenyl group separated graphic^ system. The proves (^ed to J^e size 
by two carbon atoms from an amino group; however. a>rey-Paulmg.Koltun (CPK) .r**^*!) J^^^^^^ 
the two carbon atoms at C-3 and 04 on the ring arc 53 about the helix axis with uncoiling of the helix axis of 
bulkier than the hydroxyls at those positions and would 26*. (The stacking of the base pain, in the B forni of 
ZTdistort the cavity DNA U a 36* coU; thus the base pairs in our models are 

In determining the biological acUvity of a natural or coiled lO"). This degree of uncoiling is consistent with 
a synthetic compound, the structure of the unknown the unwinding angle attributed to some intercalaUng 
could be: (1) compared to the structure of known, natu- 60 agents (Miller, K. J. m Biomolecular Stereodynamics II. 
rally occurring molecules; and (2) fit into the cavities 1981, R. H. Sarma, ed., Ademnc Pr«s, New York, page 
already constructed. If the structure is analogous to a 469). The amount of separation of the base pair profiles 
naturally occurring molecule, the molecule would then was approximately 6 angstroms. Positive images of the 
be fit into the cavity constructed for the naturally oc- cavity were constructed with "Oasis whi^ 
curring molecule. If the molecule is not analogous to a . 65 toured to the surfaces of the base pairs. The Oasis 
natural product, the molecule would be fit into the images were sliced down the center parallclto the base 
cavities already constructed. If a molecule can be ac- pairs; each of the central slices is unique. The ptMitive 
commodated within an existing cavity as described "Oasis" images were then dipped m wax for reinforce- 
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epinephrine 
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norepinephrine 


cyclobciylaioine 


dopambie 


3.cyclohexylpropyUmjne 
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ment and used as molds to make the cavities with RTV 
silastic polymer. The ten cavities are formed from the 
following base pairs: AT/TA, TA/AT, TA/TA, 
CG/CG, CG/TA, AT/CG, GC/CG, TA/GC, 
TA/CG and CG/GC. 

The helix axis, a consistent reference point in the 
DNA structure was used to examine the stereochemical 
relationships of a base in any sequence. The unique 
symmetrical arrangement of bases in relationship to the 
helix axis of B-DN A made this feasible. 
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Fit Cavities: 
Biologtcatly Active 



Do Not Fit Cavihea: Not 
KnowD To Be Biologjcally 
Active 
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B. Utility of the Cavities Formed by Unwinding DNA 

If a given cavity is constructed and the complimen- 
tary fit of a natural or synthetic ("parent") compound 
has been established, the cavity can then be used to 13 
establish the fits of other molecules. Given the structure 
of a compound, the potential linkages . (i.e., hydrogen 
bonding) of hydrophilic groups to the backbone of the 
DNA as well as hydrogen bonding points between base 
pairs on the upper and lower surfaces of the cavity can 
be evaluated; the ability of its shape to be accommo- 
dated within the cavity can also be determined. If the 
molecule has similar hydrogen bonding points to the 
"parent" compound and fits into the cavity, it will very 
likely have the same biological activity. For example, 
the synthetic estrogen diethylstilbestrol fits into the 
same cavity as the potent natural estrogen estradiol, and 
possesses a conformation (in the cavity) in which the 
two hydroxy] groups can hydrogen bond to phosphate 



meso-bexeatrol ' 

benzapyrene oxide 
diethybtUbestrol 
thalidomide 
tucro*e 
vitamin A 
Ibyioxtac and it* 
metabolites aod 
agonbtt 
abdiic tad 
gibberettic add 
mineral caniootds 



The degree of biological activity of compoimds can 
also be determined by simply taking a series of com- 
2Q pounds (natural or synthetic), fitting them into the cavi- 
ties and ranking them according to their fit and comple- 
mentary hydrogen bonding. Those compounds that fit 
best have the highest degree of activity than those that 
fit poorly. 

Several tests of the new cavities have been per- 
formed. For example, proflavine fits into only one cav- 
ity- The fit which is derived from the same sequence 
which is known experimentally to selectively bind pro- 
flavine (Miller, reference above). Also, the correlation 
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, . . , . - ■ T ^J^Fc^^ in of cavity fit and hydrogen banding to biological activ- 

oxygens on the DNA as is the case w.th estradiol. Ste- » .^^ ^ ^ demonstrated with i^turai and synthetic 



roid molecules, which fill the cavity and form hydrogen 
bonds analogous to those formed by estradiol but also 
have portions which extend out the cavity, would be 



molecules which are like estradiol. 

The fits of several groups of compounds have been 
tested in the cavities including: estrogen and related 



ideal candidates for antagonists. The antiestrogens a^t,,.r,.^i«t„. t«f«Lt-.>n„». 
;f-„ -„H „»f„«HJ„. ™ 33 agonists and antagonists; testosterone; progesterone; 



tamoxifen and nafoxidine are examples. 

A candidate molecule must: (1) fit Into a cavity be- 
tween DNA base pairs without distorting the helix 
backbone or disrupting the complementarity of the 



cortisone; Cortisol; mineral corticoids; gibberellic acid; 
abcisic acid;' vitamin A; thyroxine and its metabolites 
and agonists; prostaglandins; benz^yrene oxide; dieth- 
J „ , , - J ._ , ylstilbestrol; dialidomide; sucrose. In each case, the fit 

Watson and Cnck base pairs; md (2) be capable of ^ ^ tj,e correla- 

fonnmg stereospecific hydrogen bonds between «ich of biological acUvity with fit, including degree of 

die heteroatoms in the molecule and iwtcntial hydrogen ^ biological activity, appears to be 

bondmg pomu m the cavity without distortmg the cav- Unusual structures whose correlation of 



ity. 



Ex^ples of estrogens ^d other biologicaUy active ^^^^ ^^^^^ ^^^^ ^^^^i p^, j ^^e 

steroid antagomste which fi mto one of the ten cavities ^i^Uve biological activity of a series of curious estrone 
are summanz^ below; related molecules which do not ^ ^ B Gabbard. L F. Hamer and A. Segaloff. 
fit and have htfle or not activity are also listed. Sft^S vol. 37. pages 243-253 (1981)] corrchites with 

' fit into DNA in the cavity T-A/G-C. (Thymine- 

30 Adenine/Guanine-Cytosine). 
What we claim is: 

1. A method for determining the biological activity of 
a molecule, comprising the steps of: 

(a) preparing a model of a complementary double- 
53 stranded codon-anticodon nucleic acid complex; 

(b) removing the second base of the triplet sections of 
said codon to form in each section a space bordered 
by the remaining bases; 

(c) connecting the closest points on the upper and 
60 lower surfaces of the bases bordering the space to 

form a cavity; and 

(d) comparing the stereochemical properties of said 
molecule with each of the cavities to determine a 
complementary fit, with a fit indicating said biolog- 

65 teal activity. 

2. A method of predicting the biological activity of a 
molecule by the use of a represenution of a nucleic 
acid, comprising the steps of: 



structure with biological activity has remained enig- 





Do Not Frt Cavities; Not 


Fit Cavities: 


Known To Be Koto^cally 


Biologically Active 


Active 


estradiol 


unnatural enantiomeT of 


progestexane 


estradiol 


testosterone 


imoatoral enantiomer of 


tnnvdieiliylitilbestcTol 


piTogestcrnnc 


estrone 


lT-(i.csuadtol 


estriol (16-bydroxyestradiol) 


2-hydroxyestrone 


Z-bydroxyestimdiol 


1 6-a-hy droxyestradiol 


tanostcrol 


1 T-metboxymestranol 


cholesterol 


3-(netboxyestr»di6l 




d, l-bcxettFol 


dcbydroepiaDdrosterone 




4.aodn»teftedioae 




coniiol 




cortisone 




deoxycottisol 




etbynytestradiot 




fetfopfogestefooa 




synthetic progcsiio 




corticosterone 




aldosterone 
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(a) removing the middle base in a codon of a nucleo- 
tide 'triplet, of said nucleic acid; 

(b) analyzing the physiochemical fit of said liioleculc 
and said.cavity to determine whether-said Tit is 
complementary; 

(c) repeating steps (a) and (b) above for a diifferent 
codon if said fit is not complementary; and ' 

(d) if said fit is complementary, comparing the biplog' 



physiochemical fit of said molecule is obtained 
within a cavity; and 'j 
(0 comparing the biological activify of a known com- 
pound that has a physiocheimc^j.-fit within the 
cavity into which said molecule fits with said mole- 
cule to determine the amount of biological activity 
of said molecules. 
5. A method as claimed in claim 4 wherein said ana- 
lyzing step includes examining the degree of hydrogen 



icaJ activity of a compound known to fit .within '10 bonding of said molecule within said cavity. 



said cavity with said molecule to predict the bio- 
logical activity of said molecule. 

3. A method as claimed in claim 2 wherein said nu- 
cleic acid is B-DNA. 

4. A method of designing a biologically active mole- 
cule utilizing a representation of DNA, comprising the 
steps of: 

(a) formulating a model of a proposed molecule; 

(b) forming the topology of a cavity created by the 20 
removal of the second base of a triplet section of 
the codon in said representation of DNA; 

(c) inserting into said cavity said model of said mole- 
cule; . . • ■ ■ 

(d) analyzing the physiochemical fit of said molecule 
within said cavity to determine If said fit is comple* 
mentary; 

(e) repeating steps (b)-(d) above for a successive niim- 



6. A method as claimed in claim 4 and further includ- 
ing the step of (g) manipulating the design of said mole- 
cule and repeating steps (bHd) until the requisite level 
of biological activity is obtained. 

7. A method of predicting the biological activity of a 
molecule by the use of a representation of a cavity in 
nucleic acid comprising the steps of: 

' ' (a) preparing a model of double^stranded nucleic 
■ acid; ■ ■ 

' (b) unwinding the nucleic acid model strands to a 
predetermined degree to form a space; 

(c) connecting the closest points on the upper and 
lower surfaces of the bases bordering the space to 

■ form a cavity; 

(d) comparing the stereochemical properties of said 
molecule with each of the cavities to determine a 
complementary fit, with a fit indicating said biolog- 
ical activity^ ■ 

8. A method as claimed in claim 7 wherein said un- 



ber of cavities formed from the removal of the 30 winding step includes the step of uncoiling said strands 
■ second base of a triplet section of the codon in said about the helix axis 26*. 
representation of DNA until a complementary » • • • • 
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